Selective inline expansion for improvement of multi grain parallelism
نویسندگان
چکیده
This paper proposes a selective procedure inlining scheme to improve a multi-grain parallelism, which hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and near fine grain parallelism among statements inside a basic block in addition to the loop parallelism. Using the proposed scheme, the parallelism among different layers(nested levels) can be exploited. In the evaluation using 103.su2cor, 107.mgrid and 125.turb3d in SPEC95FP benchmarks on 16 way IBM pSeries690 SMP server, the multi-grain parallel processing with the proposed scheme gave us 3.65 to 5.34 times speedups against IBM XL Fortran compiler and 1.03 to 1.47 times speedups against conventional multi-grain parallelization.
منابع مشابه
Near Fine Grain Parallel Processing Using Static Scheduling on Single Chip Multiprocessors
With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting important problems. However, it has been thought that popular superscalar and VLIW would have difficulty to obtain scalable improvement of effective performance in future because of the limitation of instruction level para...
متن کاملA multigrain Delaunay mesh generation method for multicore SMT-based architectures
Given the proliferation of layered, multicoreand SMT-based architectures, it is imperative to deploy and evaluate important, multi-level, scientific computing codes, such as meshing algorithms, on these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level, medium-grain at the cavity level and fine-grain at the elem...
متن کاملFast thread communication and synchronization mechanisms for a scalable single chip multiprocessor
Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Smaller feature sizes have decreased the transistor switching time but at the same time increased the resistance of interconnect wires, resulting in slower signal transmission in on-chip wiring. Since future chips will have more silico...
متن کاملRuntime scheduling of dynamic parallelism on accelerator-based multi-core systems
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterogeneous multicore processors. Heterogeneous multi-core processors integrate conventional cores that run legacy codes with specialized cores that serve as computational accelerators. The term multi-grain parallelism refers to the exposure of multiple dimensions of parallelism from within the runtim...
متن کاملA Dynamic Analysis Tool for Finding Coarse-Grain Parallelism
While the chip multiprocessor (CMP) has quickly become the predominant processor architecture, its continuing success largely depends on the parallelizability of complex programs. In the early 1990s great successes were obtained to extract parallelism from the inner loops of scientific computations. General-purpose programs, however, stayed out-of-reach due to the complexity of their control fl...
متن کامل